Skip to content

TransposeLayer#193

Merged
aobolensk merged 7 commits into
mainfrom
Semyon1104/TransposeLayer
Aug 15, 2025
Merged

TransposeLayer#193
aobolensk merged 7 commits into
mainfrom
Semyon1104/TransposeLayer

Conversation

@Semyon1104

Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread src/layers/TransposeLayer.cpp Outdated
}
Shape new_shape(new_dims);

std::vector<float> output_data(input_data->size());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, check for other data types

Comment thread src/layers/TransposeLayer.cpp Outdated
}

if (perm_.empty()) {
perm_.resize(shape.dims());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we enumerate shapes starting from the end?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted the default to be reverse permutation, but it's probably unnecessary.

@allnes

allnes commented Aug 14, 2025

Copy link
Copy Markdown
Member

Currently, if the layer is constructed with empty perm_, run() fills perm_ once using the first input’s rank. That means calling the same instance later with a different-rank tensor will throw on validate_perm. Prefer computing a local permutation per call when perm_ is empty, e.g.:

void TransposeLayer::run(const Tensor& input, Tensor& output) {
  const auto& shape = input.get_shape();

  std::vector<int64_t> perm = perm_;         // copy member
  if (perm.empty()) {
    perm.resize(shape.dims());
    std::iota(perm.rbegin(), perm.rend(), 0);
  }

  validate_perm(shape /*, perm if you refactor validate_perm to take it */);
  // ... use 'perm' below instead of 'perm_'
}

This keeps the layer usable across inputs of different ranks without surprising statefulness.

@allnes

allnes commented Aug 14, 2025

Copy link
Copy Markdown
Member

Current mapping recomputes old_indices by repeated %// per element. That’s fine for clarity and likely OK at your tensor sizes, but if you profile a hot path, consider:
• Precomputing input/output strides and computing new_index via a dot with mapped coordinates; and/or
• Special-casing 2D transpose to a cache-friendlier kernel.
Not blockers.

@allnes

allnes commented Aug 14, 2025

Copy link
Copy Markdown
Member

run assumes float (input.as(), std::vector). If the framework expects multi-dtype tensors, you’ll want either a templated kernel or a small type-dispatch in run. Tests currently cover only float. Not necessarily required for this PR, but worth tracking.

@allnes

allnes commented Aug 14, 2025

Copy link
Copy Markdown
Member

Reserve capacity for new_dims (new_dims.reserve(shape.dims())).
The out-of-bounds check before writing is defensive; it should be impossible after correct shape math. Keep it or add an assertion instead.

@allnes

allnes commented Aug 14, 2025

Copy link
Copy Markdown
Member

Tests to add (nice-to-have)
• 1D default permutation (rank-1 with empty perm should be a no-op).
• Multiple runs with same instance & different ranks (catches the mutation issue above).

@aobolensk aobolensk merged commit acba1bb into main Aug 15, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants